Abstract
A variety of network models for empirical inference have been introduced in rudimentary form as models for neurological computation. Motivated in part by these brain models and to a greater extent motivated by the need for general purpose capabilities for empirical estimation and classification, learning network models have been developed and successfully applied to complex engineering problems for at least 25 years. In the statistics community, there is considerable interest in similar models for the inference of high-dimensional relationships. In these methods, functions of many variables are estimated by composing functions of more traceable lower-dimensional forms. In this presentation, wedescribe the commonality as well as the diversity of the network models introduced in these different settings and point toward some new developments.
The use of information-based model selection criteria in the GMDH algorithm
Abstract
The Group Method of Data Handling (GMDH) algorithm is an elegant approach to statistical data modeling. This paper introduces and develops information-based model evaluation criteria for the use in the GMDH algorithm to study the quality of the competing off-spring models at each generation. Thus, the problem is to integrate model selection criteria into the GMDH algorithm that choose the model which "best" approximates the given data set among the set of competing alternative models with different numbers of parameters. In order to sift a model that fits the data well, a criteria is needed that evaluates each competing alternative model in terms of bias, variability, and goodness-of-fit. Also, there is a need to consider the complexity of the selected model. The general principle, known as Occam's Razor, is that a parsimonious model is preferable to a more complex one. Therefore, we propose the use of information-theoretic model selection criteria to facilitate the identification of a parsimonious model or, in other words, a model that provides the highest information gain with the least complexity in the GMDH technology. A real numerical example is shown along with an open architecture symbolic computational toolbox to illustrate the utility of the new proposed approach.
Abstract
At present, GMDH algorithms give us the only way to get the most accurate approximations of functions and forecasts of random processes and events in case of noised and short input sampling. Revised GMDH algorithms, recently developed, use two sorting-out criteria: the basic one and at the next stage - the discriminating criterion. There are several ways to raise accuracy and the forecasting validation lead time. The first way is to develop a set of revised GMDH algorithms with a different mathematical language of modelling to choose the description which is adequate to the objects characteristics. The second way is to use GMDH algorithms as active neurons in the neuronet. The third way is to unite the GMDH algorithm with the algorithm of mathematical programming. An example of such unification is presented in the case of the number of characteristic variables being equal to the number of output variables.
Abstract
Up to now the known networks have been characterized by neurons, which are very simple processing units. Such passive neurons are not able to select and estimate their own inputs. In a new approach, which corresponds in a better way to the actions of human nervous system, the connections between several neurons are not fixed but change in dependence on the neurons themselves. Such active neurons are able, during the learning or self-organizing process, to estimate which inputs are necessary to minimize the given objective function of the neuron. This is only possible on the condition that every neuron is a complicated processing unit, such as GMDH algorithm. As an application of such nets of active neurons is considered the prediction of activity of stock exchange.
SelfOrganize! - a software tool for modelling and prediction of complex systems
Abstract
A software tool using the GMDH technique for modelling and prediction of complex linear or nonlinear multi-input / multi-output systems is presented. Key features of this tool necessary to make it applicable to a large spectrum of modelling tasks in economy, ecology and other fields are mentioned, like the use of the cross-validation principle, the selection procedure for selection of the intermediate input variables and the avoidance of conflicts during the synthesis of a system of equations. The results of a preliminary model of the national economy of the Federal Republic of Germany constructed by this tool are shown to give an overview of the effeciency and flexibility of GMDH.
The name "SelfOrganize!" is no longer valid. It was replaced by the name "KnowledgeMiner".
Knowledge Extraction from Data Using Self-Organizing Modeling Technologies
Abstract
Today, knowledge extraction from data (also referred to as Data Mining) plays an increasing role in sifting important information from existing data. Commonly, regression-based methods like statistics or Artificial Neural Networks as well as rule-based techniques like fuzzy logic and genetic algorithms are used.
This paper describes two methods working on the cybernetic principles of self-organization: Group Method of Data Handling (GMDH) and Analog Complexing. GMDH combines the best of both statistics and Neural Networks and creates adaptively models from data in the form of networks of optimized transfer functions (Active Neurons) in an evolutionary fashion of repetitive generation of populations of alternative models of growing complexity and corresponding model validation and survival-of-the-fittest selection until an optimally complex model has been created. Nonparametric models obtained by Analog Complexing are selected from a given variables set representing one or more patterns of a trajectory of past behavior which are analogous to a chosen reference pattern.
Both approaches have been developed for complex systems modeling, prediction, identification and approximation of multivariate processes, diagnostics, pattern recognition and clusterization of data samples and they are implemented in the KnowledgeMiner modeling software tool. They can be applied to problems in economy (macro economy, marketing, finance e.g.), ecology (water and air pollution problems e.g.), social sciences, medicine (diagnosis and classification problems) and other fields.
Abstract
This paper describes the application of data mining algorithms for a portfolio trading system. The goal of data mining in this case is prediction of assets of a portfolio by means of parametric or nonparametric models. Parametric models are adaptively created from data by the Group Method of Data Handling (GMDH) in the form of networks of optimized transfer functions (Active Neurons). Nonparametric models are selected from a given variables set by Analog Complexing representing one or more patterns of a trajectory of past behavior which are analogous to a chosen reference pattern. Both approaches of self-organizing modeling include not only core data mining algorithms but also an iterative process of generation of alternative models with growing complexity, their evaluation, validation and selection of a model of optimal complexity. Therefore, these approaches are denoted in this paper as self-organizing data mining.
In a modeling/prediction module a self-organizing data mining is performed to extract and synthesize hidden knowledge from a given data set systematically, fast and explicit visible. The control module of the trading system is responsible for signals generation based on predictions provided by the modeling module.
Initial performance results of a trading system are presented. The trading system simulates trading a portfolio of diverse stocks using daily out-of-sample price data.
Abstract
Knowledge extraction from data using inductive methods like GMDH has advantages in modelling of rather complex and ill-defined objects with fuzzy characteristics and for noised and extremely short data samples. Using a GMDH algorithm which optimize additionally the nonlinear partial models an example for analysis of systems of characteristics will be presented.
Abstract
At present, GMDH algorithms give us a way to identify and forecast economic processes in case of noised and short input sampling. In contrast to neural networks, the results are explicit mathematical models, obtained in a relatively short time. For ill-defined objects with very big noises, better results are obtained by analog complexing methods. Nets with active neurons should be applied to increase accuracy. Active neurons are able, during the self-organizing process, to estimate which inputs are necessary to minimize a given objective function of the neuron. In the neuronet with such neurons, we have a twofold multilayered structure: neurons themselves are multilayered, and they will be united into a multilayered network.
KnowledgeMiner is an easy-to-use modelling tool which realizes twice-multilayered neuronets and enables the creation of time series, multi input/single output and multi input/multi output systems (system of equations). Successful applications are shown in the field of analysis and prediction of characteristics of stock markets in financial risk control modelling.
Neural networks and Statistical Models Warren S. Sarle Abstract
There has been much publicity about the ability of artificial neural networks to learn and to generalize. In fact, the most commonly used artificial neural networks, called multilayer perceptrons, are nothing more than nonlinear regression and discriminant models that can be implemented with standard statistical software. This paper explains what neural networks are, translates neural network jargon into statistical jargon, and shows the relationships between neural networks and statistical models such as generalized linear models, maximum redundancy analysis, projection pursuit, and cluster analysis.
Abstract
Vorgestellt werden Forschungsarbeiten, die sich einordnen in die physisch-geographische Prozessforschung zur Beherrschung und planmaessigen Steuerung von Geooekosystemen sowie zur Analyse, Modellierung und Vorhersage von Stofftransporten und - umsaetzen in verschiedenen Landschaften. Auf der Grundlage von Zeitreihen die die Witterung, Bodeneigenschaften sowie das Abflussgeschehen kennzeichnen, wird die Abhaengigkeit geooekologischer Prozesse von ausgewaehlten Kenngroessen des meteorologischen Regimes sowie des oberen Bodenhorizontes und der Bodeneigenschaften untersucht. Zur Anwendung kommt neben der Korrelations- und Regressionsanalyse die Selbstorganisation mathematischer Modelle. Die erreichten Ergebnisse bestaetigen die Leistungsfaehigkeit einer auf den GMDH Algorithmen basierenden automatischen Modellgenerierung insbesondere bei ungenuegender A-priori-Information ueber das System und die wesentlichen qualitativen Einflussgroessen.
Contact:
muellerj@informatik.htw-dresden.de (Prof. J.-A. Mueller, author of many papers related to self-organizing modeling. He has also developed algorithms to make Analog Complexing usable for evolutionary processes.)